Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW
نویسندگان
چکیده
Collective communication is one of the most powerful message passing concepts, enabling parallel applications to express complex communication patterns while allowing the underlying MPI to provide efficient implementations to minimize the cost of the data movements. However, with the increase in the heterogeneity inside the nodes, more specifically the memory hierarchies, harnessing the maximum compute capabilities becomes increasingly difficult. This paper investigates the impact of kernel-assisted MPI communication, over two scientific applications: 1) Car-Parrinello molecular dynamics(CPMD), a chemical molecular dynamics application, and 2) FFTW, a Discrete Fourier Transform (DFT). By focusing on the usage of Message Passing Interface (MPI), we found the communication characteristics and patterns of each application. Our experiments indicate that the quality of the collective communication implementation on a specific machine plays a critical role on the overall application performance.
منابع مشابه
Fourier Transforms for the BlueGene/L Communication Network
A computational kernel of particular importance for many scientific applications is the Fast Fourier Transform (FFT) of multi-dimensional data. A fundamental challenge is the design and implementation of such parallel numerical algorithms to utilise efficiently thousands of nodes. The BlueGene/L is a massively parallel high performance computer organised as a three-dimensional torus of compute ...
متن کاملOptimizing Collective Communication in OpenSHMEM
Message Passing Interface (MPI) has been the de-facto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. OpenSHMEM is a library-based implementation of the PGAS m...
متن کامل3D FFT with 2D decomposition
Many scientific applications including molecular dynamics (MD) require a fast fourier transform (FFT). As the number of processors for high performance computer increases this transform has to be parallelized to larger number of processors to remove it as a bottleneck for the parallelization. This requires the decomposition to be changed from 1D to 2D. Such a 2D decomposed 3D FFT was implemente...
متن کاملPerformance Measurements of the 3D FFT on the Blue Gene/L Supercomputer
This paper presents performance characteristics of a communicationsintensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Adv...
متن کاملDesign and Performance Evaluation of LiMIC (Linux Kernel Module for MPI Intra-node Communication) on InfiniBand Cluster
High performance intra-node communication support for MPI applications is critical for achieving the best performance out of clusters of SMP workstations. Although the performance of system area networks has improved in the recent years, intra-node communication still remains orders of magnitude faster than the network. Present day MPI stacks cannot make use of operating system kernel support f...
متن کامل